Identifying and marking failed egress links in data plane

ABSTRACT

A method of identifying a failed egress path of a hardware forwarding element. The method detects an egress link failure in a data plane of the forwarding element. The method generates a link failure signal in the data plane identifying the failed egress link. The method generates a packet that includes the identification of the egress link based on the link failure signal. The method sets the status of the egress link to failed in the data plane based on the identification of the egress link in the generated packet.

CLAIM OF BENEFIT TO PRIOR APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 16/903,305, filed Jun. 16, 2020, now U.S. Pat. No.11,310,099, which is a continuation application of U.S. patentapplication Ser. No. 16/048,202, filed Jul. 27, 2018, which is acontinuation application of U.S. patent application Ser. No. 15/150,015,filed May 9, 2016. U.S. patent application Ser. Nos. 15/150,015,16/048,202 and 16/903,305 claim the benefit of U.S. Provisional PatentApplication 62/292,498, filed Feb. 8, 2016. The entire specifications ofwhich are hereby incorporated herein by reference in their entirety.

BACKGROUND

A forwarding element such as a switch or a router can often send packetsto a destination through several different egress paths. The forwardingelements utilize different algorithms to identify the best path to sendthe packets to optimize network congestion as well as transmission time.

Once one of these egress paths fails, the forwarding element has to getnotified that the path has failed and mark the path as failed in orderto avoid forwarding packets on the failed path. A path may fail due to aport or a wire failure inside the forwarding element or due to a pathfailure several hops away between the forwarding element and a packetdestination.

A typical solution to keep track of the failed paths is using softwarein the control plane of the forwarding element to keep track of thestatus of the configured paths and mark a path as failed as soon as thepath becomes unavailable. Utilizing software to keep track of and updatethe list of failed paths is, however, slow. Depending on the load of theprocessor that is executing the software, marking a path as failed bysoftware may take several milliseconds. Such a delay is not desirableand can cause significant delays in a high-speed forwarding element.

BRIEF SUMMARY

Some embodiments provide a hardware forwarding element (e.g., a hardwareswitch or a hardware router) with a novel packet-processing pipelinethat quickly marks a failed egress path by performing a set of hardwareand firmware operations in the data plane. The forwarding element insome embodiments includes an ingress pipeline, a traffic manager, and anegress pipeline. Each one of the ingress and egress pipelines includes apipeline with a parser, a match-action unit (MAU), and a deparser.

The parser receives the packets coming into the pipeline and produces apacket header vector (PHV) as its output. The PHV provides the inputdata to the match tables of the MAU. The MAU includes a set ofmatch-action stages. Each of these stages matches a particular set ofheader fields included in the PHV against a match table and takes anaction based on the result of the match. The output PHV is then handedto the deparser, which reassembles the packet by putting back togetherthe output PHV and the payload of the packet that the deparser receivesdirectly from the parser.

The forwarding element also includes a packet generator that is capableof generating packets inside the forwarding element and placing them inthe packet pipeline. The packet generator receives the identification offailed paths or ports. For instance, when a port or a wire inside theforwarding element fails, some embodiments generate an interrupt thatprovides the identification of the failed port (or path). The packetgenerator in some embodiments also utilizes mechanisms such as keepalive to determine failed paths that are several hops away. Once thepacket generator receives the identification of a failed link (i.e., afailed port or a failed path), the packet generator generates a packetthat includes the identification of the failed link in a predeterminedlocation in the packet header. The packet goes through the MAU pipelineand matches a predefined match field. The action corresponding to thematch field causes an action unit in the forwarding element to use thefailed link identification and compute an index to the status bit of thefailed link in a data structure and to set the status bit to off (i.e.,to indicate that the link has failed).

Some embodiments utilize a process to mark an egress link (i.e., a pathor a port) as failed by performing a set of operations that are done bydedicated hardware and firmware in the data plane of the forwardingelement. The process receives an indication that an egress link (i.e., apath or a port) of the forwarding element has failed. The process thengenerates a packet inside the forwarding element and includes anidentification of the failed link (i.e., the failed path or port) in apredetermined field of the packet header.

The process then places the packet in the packet pipeline of theforwarding element. The process then parses the packet and places theidentification of the failed link in a register of the PHV and forwardsthe PHV to the MAU. The process matches the identification of the failedlink in the PHV with the match field of a match-action entry that ispreprogrammed to match the link's identification. Each match field has acorresponding action.

Once the identification of the failed link matches a match field, theprocess uses an arithmetic logic unit (ALU) to perform the correspondingaction of the match-action entry. The process determines the location ofthe link's status bit in a data structure (e.g., a link status table ora port status table) that keeps track of live and failed links. Theprocess sets the bit at the determined location to off (or failed). Thedata structure is stored in a dual port memory that is capable of beingwritten directly by hardware. Once the status bit of the failed link isupdated, the packet is no longer needed and is dropped.

The preceding Summary is intended to serve as a brief introduction tosome embodiments of the invention. It is not meant to be an introductionor overview of all inventive subject matter disclosed in this document.The Detailed Description that follows and the Drawings that are referredto in the Detailed Description will further describe the embodimentsdescribed in the Summary as well as other embodiments. Accordingly, tounderstand all the embodiments described by this document, a full reviewof the Summary, Detailed Description and the Drawings is needed.Moreover, the claimed subject matters are not to be limited by theillustrative details in the Summary, Detailed Description and theDrawing.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth in the appendedclaims. However, for purposes of explanation, several embodiments of theinvention are set forth in the following figures.

FIG. 1 conceptually illustrates a block diagram of a hardware forwardingelement and a block diagram of an ingress/egress pipeline of thehardware forwarding element in some embodiments.

FIG. 2 illustrates ECMP routing for forwarding packets from a forwardingelement to a destination over several different paths.

FIG. 3 illustrates link aggregation as another example of forwardingpackets from a forwarding element to a destination over severaldifferent paths.

FIG. 4A conceptually illustrates a logical view of a vector that showsthe status of the egress links of a forwarding element in someembodiments.

FIG. 4B conceptually illustrates an implementation of the logical vectorof FIG. 4A.

FIG. 5 conceptually illustrates a block diagram of a hardware forwardingelement that is capable of marking failed links by performing a set ofhardware operations in the data plane in some embodiments.

FIG. 6 conceptually illustrates a portion of a hardware forwardingelement used for detecting a port failure and reporting the failure tothe packet generator in some embodiments.

FIG. 7 conceptually illustrates a grid of unit memories in someembodiments.

FIG. 8 conceptually illustrates a process for assigning status bits to aforwarding element's egress links and programming match-action entriesto set the status of a failed link to failed.

FIG. 9 conceptually illustrates the steps that the hardware forwardingelement of FIG. 5 takes to mark a failed link in data plane in someembodiments.

FIG. 10 conceptually illustrates a process that a forwarding elementperforms in data plane in order set the status of a failed link tofailed in some embodiments.

FIG. 11 conceptually illustrates a process that a forwarding elementperforms in the data plane in order to set the status of a failed linkto failed in some embodiments.

FIG. 12 conceptually illustrates a port status table of some embodimentsmaintained in dual port memory that is writable by hardware.

FIG. 13 conceptually illustrates a process for assigning backup egressports for a forwarding element and programming match-action entries toset the status of a failed port to failed.

FIG. 14 conceptually illustrates the steps a hardware forwarding elementtakes to mark a failed port in the data plane in some embodiments.

FIG. 15 conceptually illustrates the steps a hardware forwarding elementtakes to replace a failed primary egress port with a backup port in thedata plane in some embodiments.

FIG. 16 conceptually illustrates a process that a forwarding elementperforms in the data plane in order to set the status of a failed portto failed in some embodiments.

FIG. 17 conceptually illustrates an electronic system with which someembodiments of the invention are implemented.

DETAILED DESCRIPTION

In the following detailed description of the invention, numerousdetails, examples, and embodiments of the invention are set forth anddescribed. However, it will be clear and apparent to one skilled in theart that the invention is not limited to the embodiments set forth andthat the invention may be practiced without some of the specific detailsand examples discussed.

Software defined networks (SDNs) decouple the data and control planes.The data plane, which is also referred to as forwarding plane or userplane, is the part of the network that carries data packets (i.e., userpackets) traffic. In contrast, the control plane in a network controlssignaling traffic and routing.

In a forwarding element (e.g., a hardware switch or a hardware router),the data plane is the part of the architecture that decides what to dowith the packets that arrive at the ingress interface. The data plane ofa forwarding element is implemented by hardware and firmware while thecontrol plane is implemented in software to provide for a more flexiblemanagement of network components from a central location. Keeping trackof failed paths by the software in the control plane could, however, betime consuming and slow.

Some embodiments provide a hardware forwarding element with a novelpacket-processing pipeline that quickly marks a failed egress link byperforming a set of hardware operations in the data plane. In thefollowing discussions, the term link is used to refer to a path or aport. The hardware forwarding element of some embodiments includes,among other elements, an ingress pipeline and an egress pipeline. Eachof these pipelines includes a parser, a match-action unit (MAU), and adeparser.

FIG. 1 conceptually illustrates a block diagram of a hardware forwardingelement 105 and a block diagram of an ingress or egress pipeline 145 ofthe hardware forwarding element in some embodiments. As shown, theforwarding element 105 includes an ingress pipeline (or data path) 110,a traffic manager 115, and an egress pipeline 120.

The traffic manager 115 has several components such as a queuing andbuffering system, a packet replicator, and a port failure feedbackgenerator. These components are described further below. The ingresspipeline 110 receives packets 125 from a set of channels (e.g., througha set of I/O modules), parses each packet header into a packet headervector (PHV), sends the PHV through a set of match and action stageswhich may modify the PHV, deparses the packet headers back from the PHVinto packet format, and queues the packet in a centralized data buffer(i.e., a data buffer provided by the traffic manager 115). Each one ofthese operations is described in more detail below by reference to thepipeline 145. The block diagram of both the ingress pipeline 110 and theegress pipeline 120 is similar to the pipeline 145.

In some embodiments, the traffic manager 115 receives the packets thatare processed by the ingress pipeline and provides a large shared buffer(storage) that accommodates the queuing delays due to oversubscriptionof the output channels of the ingress deparser. In some embodiments, thedata buffer stores packet data, while pointers to that data are kept indifferent queues per channel. Each channel in turn requests data fromthe common data buffer using a configurable queuing policy. Whenpointers to packets reach the head of the queues, the packets are readout of the data buffer of the traffic manager 115 into the egresspipeline 120.

The egress pipeline 120 receives the packets from the traffic manager115. The parser in egress pipeline separates the packet payload from thepacket headers, stores the packets headers in a PHV, sends the PHVthrough a set of match and action stages, deparses the packet headersback from the PHV into packet format, and sends the packets 130 to anappropriate output port of the forwarding element 105 to be driven offthe forwarding element (e.g., through one of the output channels). Anoutput packet may be the same packet as the corresponding input packet(i.e., with identical packet headers), or it may have different packetheaders compared to the input packet based on the actions that areapplied to the packet headers in the ingress and egress pipelines (e.g.,different header field values for certain header fields and/or differentsets of header fields).

It should be understood that the illustrated blocks in forwardingelement 105 are exemplary only. The ingress, traffic manager, and egressblocks are simplified for ease of description. For example, although thefigure shows only one entry point to the ingress parser and one exitpoint from the egress deparser, in some embodiments the input signalsare received by many different input channels (e.g., 64 channels) andthe output signals are sent out of the forwarding element from differentoutput channels (e.g., 64 channels). Additionally, although for theillustrated forwarding element only one parser interface is shown forthe ingress/egress pipeline 145, some embodiments employ numerous parserblocks (e.g., 16 parser blocks) that feed a match-action unit (MAU) ineach pipeline.

FIG. 1 also shows a block diagram 145 of an interface of the hardwareforwarding element 105. Each one of the ingress 110 and egress 120pipelines use an interface similar to the interface 145. The interfaceincludes a pipeline with three different units, namely a parser unit150, an MAU 155, and a deparser unit 160. The parser 150 of someembodiments receives the incoming packets and produces a packet headervector (PHV) as its output. In other words, the parser 150 separates thepacket headers from the packet payload by extracting different fields ofpacket headers and storing them in the PHV.

In some embodiments the PHV includes a set of different size registersor containers. For instance, in some embodiments the PHV includessixty-four 8-bit registers, ninety-six 16-bit registers, and sixty-four32-bit registers (for a total of 224 registers containing 4096 bits).Other embodiments may have any different numbers of registers ofdifferent sizes. In some embodiments, the parser 150 stores eachextracted packet header in a particular subset of one or more registersof the PHV. For example, the parser might store a first header field inone 16-bit register and a second header field in a combination of an8-bit register and a 32-bit register (e.g., if the header field is 36bits long).

The PHV provides the input data to the match tables of the MAU. In someembodiments the MAU 155 includes a set of match-action stages (e.g., 32match-action stages). Each of these stages matches a particular set ofheader fields against a match table and takes an action based on theresult of the match (e.g., assigning the packet to an output port andqueue, dropping the packet, modifying one or more of the header fields,etc.). Based on the actions taken on different header data during thedifferent stages of the MAU 155, the PHV that the MAU outputs mightinclude the same header data as the PHV that the MAU received from theparser, or the output PHV might contain different data than the inputPHV.

The output PHV is then handed to the deparser 160. The deparser 160reassembles the packet by putting back together the output PHV (thatmight or might not have been modified) that the deparser receives fromthe MAU 155 and the payload of the packet that the deparser receivesdirectly from the parser 150. The deparser then sends the packets 140out of the ingress/egress pipeline (to the traffic manager 115 or out ofthe forwarding element, depending on whether it is the deparser for theingress pipeline or the egress pipeline).

I. Identifying and Marking Failed Links in Data Plane

Forwarding a packet from a forwarding element to a destination that isseveral hops away can often be done over several different paths. Once apath is determined to have failed, an alternative path with the samecost (or a path with the least possible cost) is selected to replace thefailed path. One such example is equal-cost multi-path (ECMP) routing.Another example is link aggregation (LAG).

A. Forwarding the Packets using ECMP

ECMP is a routing strategy that selects the next hop for forwarding apacket to the final destination in such a way to minimize the overallcost (e.g., the required time or the network congestion) for forwardingthe packet to the final destination. FIG. 2 illustrates ECMP routing forforwarding packets from a forwarding element 205 to a destination 210over several different paths through several hops 240-260 that can alsobe forwarding elements. The figure is shown in two stages 201 and 202.The cost of sending a packet through each path is written next to thepath.

As shown, there are several paths such as A-B-E-G, A-C-E-G, and A-D-F-Gbetween source A 205 and destination G 210 that cost 6 units. Each oneof these paths is, e.g., a separate open system interconnection (OSI)Layer 3 (L3) path where packets can be sent through. In stage 201 thepath A-B-E-G (as shown by arrow 215) is utilized to send packets for oneor more flows between source A 205 and destination G 210. As shown,multiple paths can be on the same OSI Layer 2 (L2) port of a forwardingelement. For instance, in FIG. 2, both paths A-C-E-G and A-C-G are onport 230 of forwarding element 205.

In stage 202, the path between hops B 240 and E 245 fails. According toECMP strategy, another route between the source 205 and the destination210 is selected to keep the transmission cost at a minimum. As shown,forwarding element A 205 selects the path A-C-E-G 220 to replace path215.

B. Forwarding the Packets using LAG

FIG. 3 illustrates LAG as another example of forwarding packets from aforwarding element to a destination over several different paths. LAGcombines multiple network connections in parallel to provide throughputand redundancy. The figure is shown in two stages 301 and 302. The costof sending a packet through a path is written next to each path. Asshown, there are several paths between forwarding element A 305 and hopB 330 (which could be another forwarding element) that have equal cost.These paths, e.g., use OSI Layer 2 (L2) ports on forwarding element 305that are on one logical channel bundle. These paths provide parallelismto increase throughput as well as redundancy.

As shown in stage 301, the path A-B-D-E 315, which passes through path335 between port 340 of forwarding element 305 and hop 330 is used topass packets for one or more flows from forwarding element A 305 todestination E 310. In stage 302 port 340 fails. As a result, link 335becomes inaccessible. As shown, another path 320 (which includes thelink 345 between port 350 of forwarding element 305 and hop 330) isselected to replace the failed path 315.

In addition to the examples of ECMP and LAG, it is possible that severaltunnels go through the same egress port of the forwarding element. Evenif the port remains functional, one of the tunnels may fail several hopsaway downstream. Similar to the examples of FIGS. 2 and 3, the failedpath has to be replaced with another path despite the fact that theegress port is still operational.

C. Link Status Table

Some embodiments maintain the status of each egress link of a forwardingelement in a data structure that includes a flag (e.g., one bit) perlink. The value of the bit indicates whether the corresponding link isup or down. For instance in some embodiments a value of 1 indicates thatthe corresponding link is operational and a value of 0 indicates thatthe corresponding link is down. In other embodiments, a value of 0 maybe used to indicate that a link is operational and a value of 1 toindicate a link is down.

FIG. 4A conceptually illustrates a logical view of a data structure(e.g. a vector) that shows the status of the egress links of aforwarding element in some embodiments. As shown, vector 405 is an arrayof n bits. Each bit corresponds to a configured egress link (i.e., aport or a path) of the forwarding element. The status of each link isrepresented by the value of the corresponding bit. When a link is up andoperational, the corresponding bit is set to on (e.g., is set to 1) toindicate that the link is live. On the other hand, when a link is down,the corresponding bit is set to off (e.g., is set to 0) to indicate thatthat link has failed and is not available. Vector 405 in someembodiments is stored in memory as a group of one or more words.

FIG. 4B conceptually illustrates an implementation of the logical vector405 of FIG. 4A. As shown, some embodiments utilize a link status tablein an area of memory 410 (referred to herein as the live link vectortable) for storing the status of the links. The memory used to storetable 410 in some embodiments is a dual port memory that is capable ofbeing read and written by hardware. The dual port memory is also capableof being written by software. In contrast, a random access memory (RAM)is read by hardware but is written only by software. For instance thesoftware writes into a buffer, which is in turn transferred into theRAM.

The dual port memory used to store the live link vector table 410 insome embodiments is implemented from single port static random-accessmemory (SRAM) units. These embodiments utilize a map RAM (e.g., a smallSRAM of 1024 entries by 11 bits) for each unit SRAM. The map RAM storeswhether the corresponding unit SRAM has the most up to date data for amemory address. Simultaneous read and write operations are performed asfollows.

The read operation is performed by (1) presenting the address to read toall map RAMs, (2) the map RAM with the data to be read signals that itsassociated unit (e.g., SRAM S1) holds the most up to date data, (3) theunit SRAM S1 is read at the corresponding address. Since the writeoperation cannot be performed with the same unit where the datacurrently resides (because the single port of SRAM S1 is occupied by aread), the write is performed by (1) querying the map RAMs to determinewhich unit SRAM is not busy and has the specified address available forwrite operation, (2) writing the data to the free SRAM (e.g. SRAM S2),(3) updating the map RAM associated with unit SRAM S2 to indicate unitSRAM S2 has the most up to date version of the data, and (4) updatingthe map RAM associated with unit SRAM S1 to indicate the address in SRAMS1 is now available for write operations (since the data in SRAM S1 isnow stale).

As shown, table 410 includes several groups of live link vectors. Eachgroup is being used by one application (or one user). For instance,group 415 includes several live link vectors (e.g., 128 bits each).Group 415 maintains the status of the links used by one application thatutilizes a forwarding element such as forwarding element 105 in FIG. 1.

Once a link such as path 215 in FIG. 2 or port 340 in FIG. 3 fails, atypical solution in prior art forwarding elements is for software incontrol plane to mark the link as failed and select an alternative linkto replace the failed link. Utilizing software to mark a link as failedand determine a replacement link is, however, time consuming and slow.For instance, marking the link as failed by software may take severalmilliseconds. Accordingly, some embodiments provide a technique toquickly mark a failed link by performing a set of hardware operations inthe data link (e.g., in the order of a few microseconds) and routepackets to an alternative link without software involvement.

D. Detecting and Marking a Failed Link

FIG. 5 conceptually illustrates a block diagram of a hardware forwardingelement 505 that is capable of marking failed links by performing a setof hardware operations in the data plane in some embodiments. As shown,in addition to ingress pipeline 110, traffic manager 115, and egresspipeline 120, the forwarding element includes a packet generator 510.The packet generator is capable of generating packets internally in theforwarding element and sending the packets through the packet pipeline.As shown, the ingress packets 125 are received at the ingress pipeline110 through a set of ingress ports 545 while packets 515 that aregenerated by the packet generator are received at the ingress pipelineat a separate port 520.

As shown, packet generator 510 receives the identification 525 of failedlinks. For instance, when a forwarding element's port fails, someembodiments generate an interrupt that provides the identification ofthe failed port. The interrupt is used to provide the identification ofthe failed port to the packet generator. As another example, the packetgenerator may receive an identification of a failed path (such as path215 in FIG. 2) when a portion of the path that is several hops awayfails. For instance, the packet generator receives a hardware signalwhen the failure of a keep alive signal indicates a portion of an egresspath has failed.

FIG. 6 conceptually illustrates a portion of a hardware forwardingelement used for detecting a port failure and reporting the failure tothe packet generator in some embodiments. The figure shows trafficmanager 115, several ingress pipelines 621-624 (each pipeline similar topipeline 110 in FIG. 5), several egress pipelines 631-634 (each pipelinesimilar to pipeline 120 in FIG. 5), and several packet generators611-614 (each packet generator similar to packet generator 510 in FIG.5). Each packet generator 611-614 is associated with one ingresspipeline 621-624. For instance, packet generator 611 is associated withingress pipeline 621.

The figure also shows several media access control (MAC) units 601-604to monitor ingress and egress ports. In some embodiments, one MAC unitis utilized for monitoring both the ingress and the egress ports of apipeline. For instance, the blocks labeled MAC unit 601 next to theingress pipeline 621 and the egress pipeline 631 are one MAC unit whichare shown in FIG. 6 as two separate blocks for clarity. In otherembodiments, separate MAC units are utilized to monitor the ingress andegress ports of each pipeline. Once an egress port fails, thecorresponding MAC unit 601-604 informs traffic manager 115 using ahardware signal (as conceptually shown by arrow 660).

As shown, traffic manager 115 has several components: a queuing andbuffering system 645, a packet replicator 650, and a failure feedbackgenerator 640. As described above, the queuing and buffering systemprovides a large shared buffer that accommodates the queuing delays dueto oversubscription of the output channels of the ingress deparser. Portfailure feedback generator 640 receives a hardware signal from the MACunit that detects a port failure.

In the example of FIG. 6, MAC unit 601 detects that the egress port (notshown) being monitored by the MAC unit has failed. MAC unit 601 sends asignal 660 to the port failure feedback generator 640. The port failurefeedback generator 640 in turn generates a hardware signal (asconceptually shown by arrow 670) to the packet generator 611 connectedto the ingress pipeline 621 and egress pipeline 631 that are associatedwith the failed port. The hardware signal includes the identification ofthe failed port. For instance the port failure feedback generator insome embodiments identifies the failed port based on which MAC unit hasreported the failure. In other embodiments, the signal from a MAC unit(e.g., a MAC unit that monitors several ports) to the failure feedbackgenerator includes an identification of the failed port (e.g., in theform of an n bit of information that uniquely identifies the failedport). The failure feedback generator then sends a signal to the packetgenerator and includes the identification of the failed port (e.g., inthe form of an m bit of information that uniquely identifies the failedport).

The packet generator 611 then generates a packet 670 that is placed iningress pipeline 621. As described below, the packet 670 cause thestatus bit corresponding the failed port to be set to off. All actionsof detecting that a port has failed by a MAC unit (such as MAC unit601), sending a signal from the MAC unit to the traffic manager 115,sending a signal from the traffic manager to a packet generator (such aspacket generator 611), generating a packet (such as packet 670) by thepacket generator, and setting the status bit of the failed port to offare done by hardware and firmware in the data plane of the forwardingelement without using the control plane or software.

Referring back to FIG. 5, the figure shows one of the ingress pipeline,egress pipeline, and packet generators of FIG. 6. Once the packetgenerator 510 receives the identification of a failed link, the packetgenerator generates a packet 515 that includes the identification of thefailed link in a predetermined location in the packet header. The packetgoes through the MAU match-action stages and matches a predefined matchfield. The action corresponding to the match field causes apreprogrammed action unit in the forwarding element to use the failedlink identification and compute an index to the status bit of the failedlink in the live link vector table and to set the bit to off (i.e., toindicate that the link has failed).

The hardware forwarding element of some embodiments processes networkpackets according to a series of match-action tables that specify whento perform certain operations on the packets. The match-action tablesinclude match entries that specify sets of match conditions that can bemet by packets, and corresponding action entries that specify operationsto perform on packets that meet the match conditions.

As an example, the match entry of a match-action table might match onthe identification of a failed link. The corresponding action entrymight specify that the status bit of the link in the live link vectortable has to be set to off. As another example, a match-action tablemight match on the destination address of an ingress packet and specifyan output port to which to send the packet. Different destinationaddresses (i.e., different match entries) correspond to output actionsto different ports (i.e., different action entries) of the forwardingelement.

In some embodiments, the forwarding element includes a set of unitmemories (e.g., SRAM and/or ternary content-addressable memory (TCAM)).The unit memories implement a match-action table by having a first setof the unit memories store the match entries and a second set of theunit memories store the action entries. That is, for a particular matchentry and the corresponding action entry, the match entry is stored in afirst unit memory and the action entry is stored in a second unitmemory.

Some embodiments arrange the unit memories in a grid of rows andcolumns, with horizontal and vertical routing resources that connectsthe unit memories to arithmetic logic units (ALUs), also referred to asaction units, that read the data from the unit memories in order toperform the match and action operations. In some such embodiments, afirst pool of unit memories within a grid (e.g., a set of one or morecolumns of the grid) are utilized for the match entries, and a secondpool of unit memories within the grid are utilized for the actionentries. Some embodiments assign other functions of the forwardingelement to unit memories within the grid as well, including statistics,meters, state, ternary indirection, etc. In some embodiments, the matchmemories are segregated (assigned to a specific set of columns, such asthose closest to the ALUs) while the remaining memories in the grid areused for implementing memories for other functions (statistics, meters,etc.).

Each match entry of some embodiments includes two portions: the set ofmatch conditions for a packet to meet, and an address of the actionentry to read when the set of match conditions is met by a packet. Theaddress, in some embodiments, specifies both a memory page thatindicates a unit memory within the grid of unit memories, and a locationwithin that memory page.

FIG. 7 conceptually illustrates a grid 700 of unit memories in someembodiments. Specifically, this example shows 96 unit memories arrangedin 16 logical rows, with each row associated with an arithmetic logicunit (ALU) 715. The 16 logical rows are divided into two separate grids705 and 710 of eight rows, having six columns in each of the twoseparate grids. It should be understood that the arrangement of memoriesshown in FIG. 7 is only one of many examples of the possiblearrangements of unit memories to implement match-action tables in aforwarding element, and that the inventive concepts described herein areapplicable to many such arrangements.

These unit memories, in some embodiments, each have a number of memorylocations, or “words” that can be read by the ALUs. The wiring thatallows ALUs to read from several different rows is described in detailin the U.S. Provisional Application 62/108,409, filed Jan. 27, 2015,which is incorporated herein by reference. As shown for one of the unitmemories 720, each memory includes N locations, from Word 0 to Word N-1.In some embodiments, these locations each have a fixed width based onthe specific unit memories used in the grid 700, such as 64 bits, 128bits, 256 bits, etc. The ALUs 715 in some embodiments read one memorylocation per unit memory in a given clock cycle.

In some embodiments, each of the unit memories has a designatedfunction. For instance, a first unit memory might store match entries,while a second unit memory stores the action entries that correspond tothe match entries of the first unit memory. In addition, the unitmemories may store other data for a match-action based forwardingelement, including meters (used to measure data flow rates) andstatistics (e.g., counters for counting packets, bytes, etc.).

Referring back to FIG. 5, the match-action table of the MAU includes amatch entry to match the identification of each egress link. Matchingthe link identification and performing of the corresponding action (ifthere is a match) is performed by one of the ALUs. The correspondingaction entry causes the ALU to use the failed link identificationincluded in the packet and compute an index to the status bit of thefailed link in the live link vector table. The action also causes theALU to set the bit to off (i.e., to indicate that the link has failed).After the live link vector table is updated, the packet is not neededand is dropped without being sent out from one of the egress ports 550.

FIG. 8 conceptually illustrates a process 800 for assigning status bitsto a forwarding element's egress links and programming match-actionentries to set the status of a failed link to failed. Process 800 insome embodiments is performed when the hardware forwarding element isdeployed and an initial set of egress links are configured. The processis also performed each time a new link is configured in order to updatethe match-action table.

As shown, the process assigns (at 805) a status bit in the link statustable (e.g., the live link vector table 410 in FIG. 4) for eachconfigured egress link of the forwarding element. As described above,the link status table in some embodiments is stored in dual port memorythat is capable of being written by either hardware or software. Theprocess also optionally sets the status of all links to operational(e.g., sets the status bits to 1).

For each configured egress link, the process creates (at 810) a matchfield in a match-action table of the MAU to match the identification ofthe link. Next, for each configured egress link, the process creates (at815) the corresponding action to (i) determine the location of thelink's status bit in the link status table based on the linkidentification in a predetermined field of the packet header, (ii) setthe status of the link in the link status table to failed (e.g., to setthe bit to 0), and (iii) drop the packet after the bit in the linkstatus table is updated. The process then ends.

Process 800 in some embodiments utilizes a programming language that isdesigned to program packet forwarding data planes in order to programthe match-action table. For instance, some embodiments utilize aprogramming language such as P4, which is used for programmingprotocol-independent packet processors. P4 language works in conjunctionwith protocols such as OpenFlow and is designed to program thematch-action tables.

FIG. 9 conceptually illustrates the steps hardware forwarding element505 of FIG. 5 takes to mark a failed link in data plane in someembodiments. The figure shows the ingress pipeline of the forwardingelement. As shown, packet generator 510 receives the identification 525of a failed egress link (i.e. a failed egress path or port). The packetgenerator generates a packet 905 that includes the identification of thefailed link (or the failed port) in a predetermined field of the packetheader. In other words, the packet includes a specific signature for thefailed link that is used to match a preprogrammed match field of amatch-action table in the MAU. The packet is then placed into the packetpipeline of the forwarding element through the packet generator port520. The parser 150 then parses the packet header and creates a PHV. Oneof the registers or containers in the PHV includes the identification ofthe failed link.

FIG. 10 conceptually illustrates generation of a PHV by a parser from apacket that is generated by a packet generator in some embodiments. Asshown, the packet generator 510 generates a packet 905 that includes theidentification of the failed link in a predetermined field 1005 of thepacket header 1010. In this example, other fields of the packet headerdo not include relevant information.

When a packet is received by the parser 150, the parser parses thepacket headers into the PHV 1025. However, not every header field ofeach packet header is needed by the MAU stages of the upcoming ingressor egress pipeline to which the parser sends the PHV. For instance, someof the packet header fields will (i) not be matched against by any ofthe match entries of the match tables in the pipeline and (ii) not bemodified by any possible action entry that could be performed in thepipeline. Thus, as the parser 150 extracts each packet header from apacket, the parser determines which of the header fields of the packetheader might be processed by at least one of the match-action stages ofthe MAU.

The illustrated example shows that a packet header 1015 of the packet905 includes several participating header fields 1005-1010 that the MAUis configured (e.g., by a configurator module of the control plane) topotentially process. At the same time, the packet header 1015 alsoincludes several other non-participating header fields 1020 that the MAUis not configured to process. In some embodiments, when the parser 150extracts a particular packet header from a packet, the parser mustextract the entire contiguous packet header at once (i.e., the parsercannot leave certain fields of a packet header in the payload whileplacing the other fields of the packet header in the PHV). Because thedifferent participating header fields of the packet header are often notplaced next to each other in the packet header (as illustrated in thefigure), the parser of some embodiments separates these participatingheader fields from nonparticipating fields during extraction of thepacket header.

For example, the MAU might be configured to process only a particularset of header fields in a UDP packet header, which may not be the firsttwo header fields of the packet header (i.e., the source and destinationports). In such a case, the parser locates the particular header fieldsin the set, pulls these fields out of the packet header, and stores theheader fields in the PHV. However, the other nonparticipating headerfields that are also extracted from the packet have to be dealt with aswell. Therefore, in some embodiments, the parser looks at each headerfield in the packet header and determines whether the identified headerfield might be processed by the MAU or will definitely not be processedby the MAU.

If the parser 150 determines that the header field is one of theparticipating header fields, the parser stores the header field in thePHV 1025 (i.e., in a particular set of registers or containers of thePHV 1025 designated for that header field). On the other hand, if theparser determines that the identified header field is not supposed to beprocessed by the MAU, the parser stores the header field in a separatestructure (not shown) that is subsequently sent directly to the deparserof the pipeline without getting processed.

The parser of some embodiments determines which fields of each packetheader may be processed and which fields will not be processed by theMAU, based on the information the parser receives from the packet itself(e.g., by one or more particular packet header of the packet), and basedon the configuration data that is received, for example, from a compilerin the control plane. In some embodiments, the compiler receives thedata required for configuring the pipeline (e.g., through a programinglanguage code such as the above-mentioned P4 language), generates a setof configuration data, and distributes the generated data to aconfigurator module (also in the control plane). The configurator modulethen distributes the configuration data to both parser and MAU of thepipeline in the forwarding element (e.g., at run-time or during setuptime). For the packet 905 that is generated by the packet generator 510for the purpose of identifying a failed link, the relevant information1005 is in a predetermined field of the packet header 1015. Thisinformation is extracted by the parser 150 and is placed in apredetermined register (or container) 1030 of the PHV 1025.

Referring back to FIG. 9, the PHV passes through the pipeline of matchand action stages 915-925. One of these match-action stages 920 ispreprogrammed (e.g., as described above by reference to process 800) tomatch the identification of the failed link included in the PHV. Thematch entry 930 matches the identification of the failed link. Thecorresponding action entry 935 includes instructions for an ALU 945 (asdescribed above by reference to FIGS. 7 and 8) to (i) determine thelocation of the link's status bit in the link status table 230 based onthe link identification in a predetermined field of the packet header,(ii) set the status of the link in the link status table to failed(e.g., to set the bit to 0), and (iii) drop the packet after the bit inthe link status table is updated.

Depending on the particular implementation of the link status table, theaction entry causes the ALU to utilize the identification of the link tocalculate an index to the link status table 230. For instance, for thelink status table 410 shown in FIG. 4B, the ALU may calculate a pointerto the particular link vector group 415 as well as an offset to thelocation of the status bit that corresponds to the failed link.

The ALU in some embodiments is capable of performing operations such aswriting into map RAM memory used to store the link status table 140. TheALU, therefore, sets (as shown by the dashed arrow 940 in FIG. 9) thestatus bit 910 that corresponds to the failed link to failed (e.g., to0). The ALU then drops the packet, as there is not need for the packetto be sent out of an egress port.

FIG. 11 conceptually illustrates a process 1100 that a forwardingelement performs in the data plane in order to set the status of afailed link to failed in some embodiments. As shown, different portionsof the process are performed by the packet generator, the ingresspipeline parser, and the MAU of the forwarding element.

The process receives (at 1105) an indication (e.g., as shown by 525 inFIGS. 5 and 9) that an egress link of the forwarding element has failed.The process then generates (at 1110) a packet inside the forwardingelement (e.g., packet 905 generated by the packet generator 510 in FIG.9). The process includes an identification (or signature) of the failedlink in the packet header. For instance, the process places theidentification in the field 1005 of the packet header 1015 as shown inFIG. 10.

The process then places (at 1115) the packet in the packet pipeline ofthe forwarding element. For instance, the process places packet 905through the packet pipeline of the forwarding element as shown in FIG.9. The process then parses (at 1120) the packet and places theidentification of the failed link in a predetermined register (orcontainer) of the PHV. For instance, the process generates the PHV 1025and places the identification of the failed link in a register 1030 ofthe PHV. The process then forwards (at 1125) the PHV to the MAU.

Next, the process matches (at 1130) the identification of the failedlink in the PHV with the match field of the match-action entry that ispreprogrammed to match the link's identification. For instance, theprocess matches the identification of the failed link with the matchfield 930 as shown in FIG. 9.

As described above, each match field has a corresponding action. Oncethe identification of the failed link matches a match field, the processuses (at 1135) the action that is preprogrammed for the correspondingALU to determine the location of the link's status bit in the linkstatus table. For instance, for the link status table 410 shown in FIG.4B, the process may calculate a pointer to the particular link vectorgroup 415 as well as an offset to the location of the status bit thatcorresponds to the failed link.

The process also sets the bit at the determined location to fail. Forexample, the process sets the status bit 910 to 0 as shown in FIG. 9.Once the status bit of the failed link is updated in the link statustable, the packet is no longer needed and is dropped. The process thenends.

II. Identifying a Failed Egress Port and Selecting a Backup Port in DataPlane

Some embodiments assign a backup port to each egress port. Theseembodiments, in data plane perform the followings: identify that aprimary egress port has failed, mark the failed port, and redirect thepackets that were destined to egress from the failed port to the backupport. Identifying the failed port, marking the failed port, andredirected the packets to the backup port are all done in data planeusing hardware and firmware without using the control plane andsoftware.

As described above by reference to FIG. 6, some embodiments detect afailed port by a MAC unit and send a signal to the traffic manager. Thetraffic manager sends a signal to the packet generator on the pipelinethat corresponds to the failed port. The packet generator then generatesa packet to mark the failed port in the link status table. Theembodiments that utilize backup ports maintain a data structure(referred to herein as port status table) to keep track of the primaryand backup ports.

FIG. 12 conceptually illustrates a port status table of some embodimentsmaintained in dual port memory that is writable by hardware. As shown,some embodiments utilize an area of memory 1205 (referred to herein asthe port status table) for storing the status of the port pairs. Thememory used to store table 1205 in some embodiments is a dual portmemory that is capable of being read and written by hardware.

As shown, table 1205 identifies the status 1210 of each primary port andthe status 1215 of each backup port. Each port is associated with a flag(e.g., a bit in the table). In the example of FIG. 12, the status 1220of a primary port is marked as failed (i.e., set to 0) while the status1225 of the corresponding backup port is on (i.e., set to 1).

Once a port fails, a typical solution in prior art forwarding elementsis for software in control plane to mark the port as failed and selectan alternative port to replace the failed port. Utilizing software tomark a port as failed and determine a replacement port is, however, timeconsuming and slow. Accordingly, some embodiments provide a technique toquickly mark a failed port by performing a set of hardware and firmwareoperations in the data path and route packets to a backup port withoutsoftware involvement.

FIG. 13 conceptually illustrates a process 1300 for assigning backupegress ports for a forwarding element and programming match-actionentries to set the status of a failed port to failed. Process 1300 insome embodiments is performed when the hardware forwarding element isdeployed and an initial set of egress ports are configured. The processis also performed each time a new port is configured in order to updatethe match-action table.

As shown, the process assigns (at 1305) a backup port to each configuredegress port. The process then assigns (at 1310) a status bit in a statustable (e.g., the port status table 1205 in FIG. 12) to each configuredprimary port and each configured backup port. The status table is storedin memory that is capable of being written by either hardware orsoftware. The process also optionally sets the status of all ports tooperational (e.g., sets the status bits to 1).

For each configured primary port, the process creates (at 1315) a matchfield in a match-action entry to match the identification of the primaryport. For each match field created for a configured port, the processcreates (at 1320) an action to (i) identify the location of the statusbit of the port in the port status table (ii) set the status of the portin the port status table to failed, and (iii) drop the packet thatmatched the match-action after the bit in the port status table isupdated. The process then ends.

Process 1300 in some embodiments utilizes a programming language that isdesigned to program packet forwarding data planes in order to programthe match-action table. For instance, some embodiments utilize aprogramming language such as P4, which is used for programmingprotocol-independent packet processors. P4 language works in conjunctionwith protocols such as OpenFlow and is designed to program thematch-action tables.

FIG. 14 conceptually illustrates the steps a hardware forwarding element1405 takes to mark a failed port in data plane in some embodiments. Thefigure shows the ingress pipeline of the forwarding element. As shown,packet generator 510 receives an identification 1490 of a failed egressport. The packet generator generates a packet 1405 that includes theidentification of the failed port in a predetermined field of the packetheader. In other words, the packet includes a specific signature for thefailed port that is used to match a preprogrammed match field of amatch-action table in the MAU. The packet is then placed into the packetpipeline of the forwarding element through the packet generator port520. The parser 150 then parses the packet header and creates a PHV. Oneof the registers or containers in the PHV includes the identification ofthe failed port. For instance, parser includes the identification of thefailed port in a register such as register 1030 of PHV as shown in FIG.10.

The PHV passes through the pipeline of match and action stages1415-1425. One of these match-action stages 1420 is preprogrammed (e.g.,as described above by reference to process 1300) to match theidentification of the failed port included in the PHV. The match entry1430 matches the identification of the failed port. The correspondingaction entry 1435 includes instructions for an ALU 1445 (as describedabove by reference to FIGS. 7 and 13) to (i) determine the location ofthe port's status bit in the port status table based on the portidentification in a predetermined field of the packet header, (ii) setthe status of the port in the port status table to failed (e.g., to setthe bit to 0), and (iii) drop the packet after the bit in the portstatus table is updated.

Depending on the particular implementation of the port status table, theaction entry causes the ALU to utilize the identification of the port tocalculate an index to the port status table 1410. The ALU in someembodiments is capable of performing operations such as writing into mapRAM memory used to store the port status table 1410. The ALU, therefore,sets (as shown by the dashed arrow 1440) the status bit 1410 thatcorresponds to the failed port to failed (e.g., to 0). The ALU thendrops the packet, as there is not need for the packet to be sent out ofan egress port.

Once a primary egress port is marked as failed, packets that specify thefailed egress port as their destination port are modified to use thebackup port. This process is done in the data plane without using thecontrol plane and software. FIG. 15 conceptually illustrates the steps ahardware forwarding element 1405 takes to replace a failed primaryegress port with a backup port in data plane in some embodiments. Asshown, a packet 1505 is received through an ingress port 1590. Packet1505 is a data packet (also referred to as a user packet) that isreceived from outside of the forwarding element 1405. The packet isparsed by parser 150. The parser places the header fields that might beprocessed by at least one of the match-action stages 1525-1530 in thePHV.

In the example of FIG. 15, the egress port identified in the packet 1505has failed and the associated status bit 1510 in the port status table1205 has been set to off. The PHV passes through the pipeline of matchand action stages 1525-1530. One of these match-action stages 1520 ispreprogrammed to match the identification of the egress port included inthe PHV. The match entry 1510 matches the identification of the egressport. The corresponding action entry 1535 includes instructions for anALU 1545 to (i) determine the location of the port's status bit in theport status table based on the port identification in a predeterminedfield of the packet header, (ii) check whether the port status is set tofailed (e.g., to 0), and (iii) if the port has failed, set the egressport of the packet to the back up port corresponding to the failed port.The packet then proceeds through the ingress and egress pipeline andsent out of the backup egress port.

FIG. 16 conceptually illustrates a process 1600 that a forwardingelement performs in the data plane in order to set the status of afailed port to failed in some embodiments. As shown different portionsof the process are performed by the packet generator, the parser, andthe MAU of the forwarding element.

The process receives (at 1605) an indication (e.g., as shown by 525 inFIGS. 5 and 14) that an egress port of the forwarding element hasfailed. The process then generates (at 1610) a packet inside theforwarding element (e.g., packet 1405 generated by the packet generator510 in FIG. 14). The process includes an identification (or signature)of the failed port in the packet header. For instance, the processplaces the identification in the field 1005 of the packet header 1015 asshown in FIG. 10.

The process then places (at 1615) the packet in the packet pipeline ofthe forwarding element. For instance, the process places packet 1405through the packet pipeline of the forwarding element as shown in FIG.14. The process then parses (at 1620) the packet and places theidentification of the failed port in a predetermined register (orcontainer) of the PHV. For instance, the process generates the PHV 1025and places the identification of the failed port in a register 1030 ofthe PHV. The process then forwards (at 1625) the PHV to the MAU.

Next, the process matches (at 1630) the identification of the failedport in the PHV with the match field of the match-action entry that ispreprogrammed to match the port's identification. For instance, theprocess matches the identification of the failed port with the matchfield 1430 as shown in FIG. 14.

As described above, each match field has a corresponding action. Oncethe identification of the failed port matches a match field, the processuses (at 1635) the action that is preprogrammed for the correspondingALU to determine the location of the port's status bit in the portstatus table. For instance, for the port status table 1205 shown in FIG.12, the process may calculate an offset to the location of the statusbit that corresponds to the failed port.

The process also sets the bit at the determined location to fail. Forexample, the process sets the status bit 1410 to 0 as shown in FIG. 14.Once the status bit of the failed port is updated in the port statustable, the packet is no longer needed and is dropped. The process thenends.

III. Computer System

FIG. 17 conceptually illustrates an electronic system 1700 with whichsome embodiments of the invention are implemented. The electronic system1700 can be used to execute any of the control, virtualization, oroperating system applications described above. The electronic system1700 may be a computer (e.g., a desktop computer, personal computer,tablet computer, server computer, mainframe, a blade computer etc.),phone, PDA, or any other sort of electronic device. Such an electronicsystem includes various types of computer readable media and interfacesfor various other types of computer readable media. Electronic system1700 includes a bus 1705, processing unit(s) 1710, system memory 1720,read-only memory (ROM) 1730, permanent storage device 1735, inputdevices 1740, output devices 1745, and TCAM 1750.

The bus 1705 collectively represents all system, peripheral, and chipsetbuses that communicatively connect the numerous internal devices of theelectronic system 1700. For instance, the bus 1705 communicativelyconnects the processing unit(s) 1710 with the read-only memory 1730, thesystem memory 1720, and the permanent storage device 1735.

From these various memory units, the processing unit(s) 1710 retrieveinstructions to execute and data to process in order to execute theprocesses of the invention. The processing unit(s) may be a singleprocessor or a multi-core processor in different embodiments.

The read-only-memory 1730 stores static data and instructions that areneeded by the processing unit(s) 1710 and other modules of theelectronic system. The permanent storage device 1735, on the other hand,is a read-and-write memory device. This device is a non-volatile memoryunit that stores instructions and data even when the electronic system1700 is off. Some embodiments of the invention use a mass-storage device(such as a magnetic or optical disk and its corresponding disk drive) asthe permanent storage device 1735.

Other embodiments use a removable storage device (such as a floppy disk,flash drive, etc.) as the permanent storage device. Like the permanentstorage device 1735, the system memory 1720 is a read-and-write memorydevice. However, unlike storage device 1735, the system memory is avolatile read-and-write memory, such a random access memory. The systemmemory stores some of the instructions and data that the processor needsat runtime. In some embodiments, the invention's processes are stored inthe system memory 1720, the permanent storage device 1735, and/or theread-only memory 1730. From these various memory units, the processingunit(s) 1710 retrieve instructions to execute and data to process inorder to execute the processes of some embodiments.

The bus 1705 also connects to the input and output devices 1740 and1745. The input devices enable the user to communicate information andselect commands to the electronic system. The input devices 1740 includealphanumeric keyboards and pointing devices (also called “cursor controldevices”). The output devices 1745 display images generated by theelectronic system. The output devices include printers and displaydevices, such as cathode ray tubes (CRT) or liquid crystal displays(LCD). Some embodiments include devices such as a touchscreen thatfunction as both input and output devices.

Finally, as shown in FIG. 17, bus 1705 also couples electronic system1700 to a network 1725 through a network adapter (not shown). In thismanner, the computer can be a part of a network of computers (such as alocal area network (“LAN”), a wide area network (“WAN”), or an Intranet,or a network of networks, such as the Internet. Any or all components ofelectronic system 1700 may be used in conjunction with the invention.

Some embodiments include electronic components, such as microprocessors,storage and memory that store computer program instructions in amachine-readable or computer-readable medium (alternatively referred toas computer-readable storage media, machine-readable media, ormachine-readable storage media). Some examples of such computer-readablemedia include RAM, ROM, read-only compact discs (CD-ROM), recordablecompact discs (CD-R), rewritable compact discs (CD-RW), read-onlydigital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a varietyof recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.),flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.),magnetic and/or solid state hard drives, read-only and recordableBlu-Ray® discs, ultra density optical discs, any other optical ormagnetic media, and floppy disks. The computer-readable media may storea computer program that is executable by at least one processing unitand includes sets of instructions for performing various operations.Examples of computer programs or computer code include machine code,such as is produced by a compiler, and files including higher-level codethat are executed by a computer, an electronic component, or amicroprocessor using an interpreter.

While the above discussion primarily refers to microprocessor ormulti-core processors that execute software, some embodiments areperformed by one or more integrated circuits, such as applicationspecific integrated circuits (ASICs) or field programmable gate arrays(FPGAs). In some embodiments, such integrated circuits executeinstructions that are stored on the circuit itself.

As used in this specification, the terms “computer”, “server”,“processor”, and “memory” all refer to electronic or other technologicaldevices. These terms exclude people or groups of people. For thepurposes of the specification, the terms display or displaying meansdisplaying on an electronic device. As used in this specification, theterms “computer readable medium,” “computer readable media,” and“machine readable medium” are entirely restricted to tangible, physicalobjects that store information in a form that is readable by a computer.These terms exclude any wireless signals, wired download signals, andany other ephemeral or transitory signals.

While the invention has been described with reference to numerousspecific details, one of ordinary skill in the art will recognize thatthe invention can be embodied in other specific forms without departingfrom the spirit of the invention. In addition, a number of the figures(including FIGS. 8, 11, 13, and 16) conceptually illustrate processes.The specific operations of these processes may not be performed in theexact order shown and described. The specific operations may not beperformed in one continuous series of operations, and different specificoperations may be performed in different embodiments. Furthermore, theprocess could be implemented using several sub-processes, or as part ofa larger macro process.

In view of the foregoing, one of ordinary skill in the art wouldunderstand that the invention is not to be limited by the foregoingillustrative details, but rather is to be defined by the appendedclaims.

1-15. (canceled)
 16. An apparatus comprising: circuitry to receivepackets, wherein the packets comprise header fields and wherein thepackets are associated with a flow and stored in a queue; ingress packetprocessing pipeline circuitry to: determine congestion associated withthe queue, process received packets to generate packet byte countinformation and packet count information, and cause storage of thepacket byte count information and packet count information; and atraffic manager coupled to the ingress packet processing circuitry. 17.The apparatus of claim 16, comprising an egress packet processingcircuitry.
 18. The apparatus of claim 17, comprising one or more egressports, wherein the egress packet processing circuitry is to select anegress port of the one or more egress ports for packet transmissionbased on a link aggregation group (LAG).
 19. The apparatus of claim 16,comprising one or more ingress ports and one or more egress ports. 20.The apparatus of claim 16, comprising at least one port that isbi-directional for packet ingress and/or packet egress.
 21. Theapparatus of claim 16, comprising at least one memory to store at leastone of the received packets.
 22. The apparatus of claim 16, comprising aswitch, wherein the switch comprises the circuitry to receive packetsand the ingress packet processing pipeline circuitry.
 23. At least onenon-transitory computer-readable medium comprising instructions storedthereon, that if executed by at least one processor, cause the at leastone processor to: configure ingress packet processing pipeline circuitryof a switch to: determine congestion associated with a queue, processreceived packets to generate packet byte count information and packetcount information, and cause storage of the packet byte countinformation and packet count information.
 24. The non-transitorycomputer-readable medium of claim 23, comprising instructions storedthereon, that if executed by at least one processor, cause the at leastone processor to: configure circuitry of a switch to receive packets andstore received packets in a queue, wherein the packets comprise headerfields and wherein the packets are associated with a flow.
 25. Thenon-transitory computer-readable medium of claim 23, comprisinginstructions stored thereon, that if executed by at least one processor,cause the at least one processor to: configure egress packet processingcircuitry of the switch to select an egress port of the one or moreegress ports for packet transmission based on a link aggregation group(LAG).
 26. The non-transitory computer-readable medium of claim 23,wherein the switch comprises one or more ingress ports and one or moreegress ports.
 27. The non-transitory computer-readable medium of claim23, wherein the switch comprises at least one port that isbi-directional for packet ingress and/or packet egress.
 28. Thenon-transitory computer-readable medium of claim 23, wherein the switchcomprises at least one memory to store at least one of the receivedpackets.
 29. The non-transitory computer-readable medium of claim 23,wherein the switch comprises a traffic manager coupled to the ingresspacket processing circuitry.
 30. A method comprising: storing receivedpackets in a queue, wherein the packets comprise header fields andwherein the packets are associated with a flow; ingress packetprocessing pipeline circuitry performing: determining congestionassociated with the queue, processing received packets to generatepacket byte count information and packet count information, and causingstorage of the packet byte count information and packet countinformation.
 31. The method of claim 30, comprising: egress packetprocessing circuitry selecting an egress port for packet transmissionbased on a link aggregation group (LAG).
 32. The method of claim 30,comprising: performing traffic management of the received packets usinga traffic manager.